Overview

Brought to you by YData

Dataset statistics

Number of variables14
Number of observations541.909
Missing cells1.454
Missing cells (%)< 0.1%
Duplicate rows4.879
Duplicate rows (%)0.9%
Total size in memory185.5 MiB
Average record size in memory359.0 B

Variable types

Text3
Numeric7
DateTime1
Categorical3

Alerts

Dataset has 4879 (0.9%) duplicate rowsDuplicates
Month is highly overall correlated with QuarterHigh correlation
Quantity is highly overall correlated with TotalVentasHigh correlation
Quarter is highly overall correlated with MonthHigh correlation
TotalVentas is highly overall correlated with QuantityHigh correlation
Country is highly imbalanced (85.9%) Imbalance
Year is highly imbalanced (60.4%) Imbalance
UnitPrice is highly skewed (γ1 = 186.5069717) Skewed
DayOfWeek has 95111 (17.6%) zeros Zeros

Reproduction

Analysis started2025-03-21 11:49:12.044240
Analysis finished2025-03-21 11:49:40.134233
Duration28.09 seconds
Software versionydata-profiling vv4.15.0
Download configurationconfig.json

Variables

Distinct25900
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Memory size32.6 MiB
2025-03-21T12:49:40.724233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length6.0171449
Min length6

Characters and Unicode

Total characters3.260.745
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5.841 ?
Unique (%)1.1%

Sample

1st row536365
2nd row536365
3rd row536365
4th row536365
5th row536365
ValueCountFrequency (%)
573585 1114
 
0.2%
581219 749
 
0.1%
581492 731
 
0.1%
580729 721
 
0.1%
558475 705
 
0.1%
579777 687
 
0.1%
581217 676
 
0.1%
537434 675
 
0.1%
580730 662
 
0.1%
538071 652
 
0.1%
Other values (25890) 534537
98.6%
2025-03-21T12:49:41.244683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 866996
26.6%
7 358618
11.0%
6 339129
 
10.4%
4 324436
 
9.9%
8 248810
 
7.6%
3 247661
 
7.6%
0 224299
 
6.9%
1 219402
 
6.7%
9 214831
 
6.6%
2 207272
 
6.4%
Other values (2) 9291
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3260745
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
5 866996
26.6%
7 358618
11.0%
6 339129
 
10.4%
4 324436
 
9.9%
8 248810
 
7.6%
3 247661
 
7.6%
0 224299
 
6.9%
1 219402
 
6.7%
9 214831
 
6.6%
2 207272
 
6.4%
Other values (2) 9291
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3260745
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
5 866996
26.6%
7 358618
11.0%
6 339129
 
10.4%
4 324436
 
9.9%
8 248810
 
7.6%
3 247661
 
7.6%
0 224299
 
6.9%
1 219402
 
6.7%
9 214831
 
6.6%
2 207272
 
6.4%
Other values (2) 9291
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3260745
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
5 866996
26.6%
7 358618
11.0%
6 339129
 
10.4%
4 324436
 
9.9%
8 248810
 
7.6%
3 247661
 
7.6%
0 224299
 
6.9%
1 219402
 
6.7%
9 214831
 
6.6%
2 207272
 
6.4%
Other values (2) 9291
 
0.3%
Distinct4070
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size32.1 MiB
2025-03-21T12:49:41.719178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length5
Mean length5.0868448
Min length1

Characters and Unicode

Total characters2.756.607
Distinct characters51
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique233 ?
Unique (%)< 0.1%

Sample

1st row85123A
2nd row71053
3rd row84406B
4th row84029G
5th row84029E
ValueCountFrequency (%)
85123a 2380
 
0.4%
22423 2203
 
0.4%
85099b 2159
 
0.4%
47566 1727
 
0.3%
20725 1639
 
0.3%
84879 1502
 
0.3%
22720 1477
 
0.3%
22197 1476
 
0.3%
21212 1385
 
0.3%
20727 1350
 
0.2%
Other values (3949) 524648
96.8%
2025-03-21T12:49:42.291623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 828325
30.0%
1 296053
 
10.7%
3 259035
 
9.4%
8 210898
 
7.7%
9 201222
 
7.3%
0 197322
 
7.2%
4 186057
 
6.7%
7 180372
 
6.5%
5 180005
 
6.5%
6 155713
 
5.6%
Other values (41) 61605
 
2.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2756607
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 828325
30.0%
1 296053
 
10.7%
3 259035
 
9.4%
8 210898
 
7.7%
9 201222
 
7.3%
0 197322
 
7.2%
4 186057
 
6.7%
7 180372
 
6.5%
5 180005
 
6.5%
6 155713
 
5.6%
Other values (41) 61605
 
2.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2756607
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 828325
30.0%
1 296053
 
10.7%
3 259035
 
9.4%
8 210898
 
7.7%
9 201222
 
7.3%
0 197322
 
7.2%
4 186057
 
6.7%
7 180372
 
6.5%
5 180005
 
6.5%
6 155713
 
5.6%
Other values (41) 61605
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2756607
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 828325
30.0%
1 296053
 
10.7%
3 259035
 
9.4%
8 210898
 
7.7%
9 201222
 
7.3%
0 197322
 
7.2%
4 186057
 
6.7%
7 180372
 
6.5%
5 180005
 
6.5%
6 155713
 
5.6%
Other values (41) 61605
 
2.2%
Distinct4223
Distinct (%)0.8%
Missing1454
Missing (%)0.3%
Memory size43.2 MiB
2025-03-21T12:49:42.621604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length28
Mean length26.64378
Min length1

Characters and Unicode

Total characters14.399.764
Distinct characters77
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique308 ?
Unique (%)0.1%

Sample

1st rowWHITE HANGING HEART T-LIGHT HOLDER
2nd rowWHITE METAL LANTERN
3rd rowCREAM CUPID HEARTS COAT HANGER
4th rowKNITTED UNION FLAG HOT WATER BOTTLE
5th rowRED WOOLLY HOTTIE WHITE HEART.
ValueCountFrequency (%)
set 54599
 
2.3%
of 53351
 
2.3%
bag 51911
 
2.2%
red 42902
 
1.8%
heart 39163
 
1.7%
retrospot 35126
 
1.5%
vintage 33748
 
1.4%
design 30066
 
1.3%
pink 29526
 
1.2%
christmas 25131
 
1.1%
Other values (2449) 1973383
83.3%
2025-03-21T12:49:43.191618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1966406
13.7%
E 1288969
 
9.0%
A 1093609
 
7.6%
T 956778
 
6.6%
R 918258
 
6.4%
O 864963
 
6.0%
I 788099
 
5.5%
S 777550
 
5.4%
N 716689
 
5.0%
L 705042
 
4.9%
Other values (67) 4323401
30.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 14399764
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1966406
13.7%
E 1288969
 
9.0%
A 1093609
 
7.6%
T 956778
 
6.6%
R 918258
 
6.4%
O 864963
 
6.0%
I 788099
 
5.5%
S 777550
 
5.4%
N 716689
 
5.0%
L 705042
 
4.9%
Other values (67) 4323401
30.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 14399764
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1966406
13.7%
E 1288969
 
9.0%
A 1093609
 
7.6%
T 956778
 
6.6%
R 918258
 
6.4%
O 864963
 
6.0%
I 788099
 
5.5%
S 777550
 
5.4%
N 716689
 
5.0%
L 705042
 
4.9%
Other values (67) 4323401
30.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 14399764
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1966406
13.7%
E 1288969
 
9.0%
A 1093609
 
7.6%
T 956778
 
6.6%
R 918258
 
6.4%
O 864963
 
6.0%
I 788099
 
5.5%
S 777550
 
5.4%
N 716689
 
5.0%
L 705042
 
4.9%
Other values (67) 4323401
30.0%

Quantity
Real number (ℝ)

High correlation 

Distinct722
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.5522495
Minimum-80995
Maximum80995
Zeros0
Zeros (%)0.0%
Negative10624
Negative (%)2.0%
Memory size4.1 MiB
2025-03-21T12:49:43.418160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-80995
5-th percentile1
Q11
median3
Q310
95-th percentile29
Maximum80995
Range161990
Interquartile range (IQR)9

Descriptive statistics

Standard deviation218.08116
Coefficient of variation (CV)22.830346
Kurtosis119769.16
Mean9.5522495
Median Absolute Deviation (MAD)2
Skewness-0.26407631
Sum5176450
Variance47559.391
MonotonicityNot monotonic
2025-03-21T12:49:43.657693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 148227
27.4%
2 81829
15.1%
12 61063
11.3%
6 40868
 
7.5%
4 38484
 
7.1%
3 37121
 
6.9%
24 24021
 
4.4%
10 22288
 
4.1%
8 13129
 
2.4%
5 11757
 
2.2%
Other values (712) 63122
11.6%
ValueCountFrequency (%)
-80995 1
< 0.1%
-74215 1
< 0.1%
-9600 2
< 0.1%
-9360 1
< 0.1%
-9058 1
< 0.1%
-5368 1
< 0.1%
-4830 1
< 0.1%
-3667 1
< 0.1%
-3167 1
< 0.1%
-3114 1
< 0.1%
ValueCountFrequency (%)
80995 1
< 0.1%
74215 1
< 0.1%
12540 1
< 0.1%
5568 1
< 0.1%
4800 1
< 0.1%
4300 1
< 0.1%
4000 1
< 0.1%
3906 1
< 0.1%
3186 1
< 0.1%
3114 2
< 0.1%
Distinct23260
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
Minimum2010-12-01 08:26:00
Maximum2011-12-09 12:50:00
Invalid dates0
Invalid dates (%)0.0%
2025-03-21T12:49:43.891550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:44.126092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

UnitPrice
Real number (ℝ)

Skewed 

Distinct1630
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.6111136
Minimum-11062.06
Maximum38970
Zeros2515
Zeros (%)0.5%
Negative2
Negative (%)< 0.1%
Memory size4.1 MiB
2025-03-21T12:49:44.359092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-11062.06
5-th percentile0.42
Q11.25
median2.08
Q34.13
95-th percentile9.95
Maximum38970
Range50032.06
Interquartile range (IQR)2.88

Descriptive statistics

Standard deviation96.759853
Coefficient of variation (CV)20.984053
Kurtosis59005.719
Mean4.6111136
Median Absolute Deviation (MAD)1.23
Skewness186.50697
Sum2498804
Variance9362.4692
MonotonicityNot monotonic
2025-03-21T12:49:44.606076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.25 50496
 
9.3%
1.65 38181
 
7.0%
0.85 28497
 
5.3%
2.95 27768
 
5.1%
0.42 24533
 
4.5%
4.95 19040
 
3.5%
3.75 18600
 
3.4%
2.1 17697
 
3.3%
2.46 17091
 
3.2%
2.08 17005
 
3.1%
Other values (1620) 283001
52.2%
ValueCountFrequency (%)
-11062.06 2
 
< 0.1%
0 2515
0.5%
0.001 4
 
< 0.1%
0.01 1
 
< 0.1%
0.03 3
 
< 0.1%
0.04 66
 
< 0.1%
0.06 117
 
< 0.1%
0.07 9
 
< 0.1%
0.08 56
 
< 0.1%
0.09 2
 
< 0.1%
ValueCountFrequency (%)
38970 1
 
< 0.1%
17836.46 1
 
< 0.1%
16888.02 1
 
< 0.1%
16453.71 1
 
< 0.1%
13541.33 3
< 0.1%
13474.79 1
 
< 0.1%
11586.5 1
 
< 0.1%
11062.06 1
 
< 0.1%
8286.22 1
 
< 0.1%
8142.75 2
< 0.1%

CustomerID
Real number (ℝ)

Distinct4372
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15253.867
Minimum12346
Maximum18287
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2025-03-21T12:49:44.819092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum12346
5-th percentile12709
Q114367
median15152
Q316255
95-th percentile17841
Maximum18287
Range5941
Interquartile range (IQR)1888

Descriptive statistics

Standard deviation1485.9059
Coefficient of variation (CV)0.097411746
Kurtosis-0.57699687
Mean15253.867
Median Absolute Deviation (MAD)941
Skewness0.10246327
Sum8.266208 × 109
Variance2207916.2
MonotonicityNot monotonic
2025-03-21T12:49:45.040092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15152 135358
 
25.0%
17841 7983
 
1.5%
14911 5903
 
1.1%
14096 5128
 
0.9%
12748 4642
 
0.9%
14606 2782
 
0.5%
15311 2491
 
0.5%
14646 2085
 
0.4%
13089 1857
 
0.3%
13263 1677
 
0.3%
Other values (4362) 372003
68.6%
ValueCountFrequency (%)
12346 2
 
< 0.1%
12347 182
< 0.1%
12348 31
 
< 0.1%
12349 73
< 0.1%
12350 17
 
< 0.1%
12352 95
< 0.1%
12353 4
 
< 0.1%
12354 58
 
< 0.1%
12355 13
 
< 0.1%
12356 59
 
< 0.1%
ValueCountFrequency (%)
18287 70
 
< 0.1%
18283 756
0.1%
18282 13
 
< 0.1%
18281 7
 
< 0.1%
18280 10
 
< 0.1%
18278 9
 
< 0.1%
18277 9
 
< 0.1%
18276 16
 
< 0.1%
18274 22
 
< 0.1%
18273 3
 
< 0.1%

Country
Categorical

Imbalance 

Distinct38
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.4 MiB
United Kingdom
495478 
Germany
 
9495
France
 
8557
EIRE
 
8196
Spain
 
2533
Other values (33)
 
17650

Length

Max length20
Median length14
Mean length13.376203
Min length3

Characters and Unicode

Total characters7.248.685
Distinct characters41
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnited Kingdom
2nd rowUnited Kingdom
3rd rowUnited Kingdom
4th rowUnited Kingdom
5th rowUnited Kingdom

Common Values

ValueCountFrequency (%)
United Kingdom 495478
91.4%
Germany 9495
 
1.8%
France 8557
 
1.6%
EIRE 8196
 
1.5%
Spain 2533
 
0.5%
Netherlands 2371
 
0.4%
Belgium 2069
 
0.4%
Switzerland 2002
 
0.4%
Portugal 1519
 
0.3%
Australia 1259
 
0.2%
Other values (28) 8430
 
1.6%

Length

2025-03-21T12:49:45.266092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united 495546
47.7%
kingdom 495478
47.7%
germany 9495
 
0.9%
france 8557
 
0.8%
eire 8196
 
0.8%
spain 2533
 
0.2%
netherlands 2371
 
0.2%
belgium 2069
 
0.2%
switzerland 2002
 
0.2%
portugal 1519
 
0.1%
Other values (35) 10904
 
1.0%

Most occurring characters

ValueCountFrequency (%)
n 1023046
14.1%
i 1001404
13.8%
d 998442
13.8%
e 526754
7.3%
m 507621
7.0%
t 504192
7.0%
g 499871
6.9%
o 499396
6.9%
496761
6.9%
U 496283
6.8%
Other values (31) 694915
9.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 7248685
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 1023046
14.1%
i 1001404
13.8%
d 998442
13.8%
e 526754
7.3%
m 507621
7.0%
t 504192
7.0%
g 499871
6.9%
o 499396
6.9%
496761
6.9%
U 496283
6.8%
Other values (31) 694915
9.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 7248685
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 1023046
14.1%
i 1001404
13.8%
d 998442
13.8%
e 526754
7.3%
m 507621
7.0%
t 504192
7.0%
g 499871
6.9%
o 499396
6.9%
496761
6.9%
U 496283
6.8%
Other values (31) 694915
9.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 7248685
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 1023046
14.1%
i 1001404
13.8%
d 998442
13.8%
e 526754
7.3%
m 507621
7.0%
t 504192
7.0%
g 499871
6.9%
o 499396
6.9%
496761
6.9%
U 496283
6.8%
Other values (31) 694915
9.6%

TotalVentas
Real number (ℝ)

High correlation 

Distinct6204
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.987795
Minimum-168469.6
Maximum168469.6
Zeros2515
Zeros (%)0.5%
Negative9290
Negative (%)1.7%
Memory size4.1 MiB
2025-03-21T12:49:45.471076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-168469.6
5-th percentile0.83
Q13.4
median9.75
Q317.4
95-th percentile59.4
Maximum168469.6
Range336939.2
Interquartile range (IQR)14

Descriptive statistics

Standard deviation378.81082
Coefficient of variation (CV)21.059325
Kurtosis151198
Mean17.987795
Median Absolute Deviation (MAD)6.75
Skewness-0.96438918
Sum9747747.9
Variance143497.64
MonotonicityNot monotonic
2025-03-21T12:49:45.705076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15 20267
 
3.7%
1.25 9550
 
1.8%
2.46 9275
 
1.7%
17.7 9250
 
1.7%
4.13 8811
 
1.6%
16.5 8533
 
1.6%
10.2 8099
 
1.5%
19.8 7676
 
1.4%
3.75 7455
 
1.4%
3.29 6522
 
1.2%
Other values (6194) 446471
82.4%
ValueCountFrequency (%)
-168469.6 1
< 0.1%
-77183.6 1
< 0.1%
-38970 1
< 0.1%
-17836.46 1
< 0.1%
-16888.02 1
< 0.1%
-16453.71 1
< 0.1%
-13541.33 2
< 0.1%
-13474.79 1
< 0.1%
-11586.5 1
< 0.1%
-11062.06 2
< 0.1%
ValueCountFrequency (%)
168469.6 1
< 0.1%
77183.6 1
< 0.1%
38970 1
< 0.1%
13541.33 1
< 0.1%
11062.06 1
< 0.1%
8142.75 1
< 0.1%
7144.72 1
< 0.1%
6539.4 2
< 0.1%
4992 1
< 0.1%
4921.5 1
< 0.1%

Year
Categorical

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size32.6 MiB
2011.0
499428 
2010.0
 
42481

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters3.251.454
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2010.0
2nd row2010.0
3rd row2010.0
4th row2010.0
5th row2010.0

Common Values

ValueCountFrequency (%)
2011.0 499428
92.2%
2010.0 42481
 
7.8%

Length

2025-03-21T12:49:45.945404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-21T12:49:46.102403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2011.0 499428
92.2%
2010.0 42481
 
7.8%

Most occurring characters

ValueCountFrequency (%)
0 1126299
34.6%
1 1041337
32.0%
2 541909
16.7%
. 541909
16.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3251454
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 1126299
34.6%
1 1041337
32.0%
2 541909
16.7%
. 541909
16.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3251454
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 1126299
34.6%
1 1041337
32.0%
2 541909
16.7%
. 541909
16.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3251454
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 1126299
34.6%
1 1041337
32.0%
2 541909
16.7%
. 541909
16.7%

Month
Real number (ℝ)

High correlation 

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.5531279
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2025-03-21T12:49:46.247454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median8
Q311
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.5090554
Coefficient of variation (CV)0.46458307
Kurtosis-1.1200445
Mean7.5531279
Median Absolute Deviation (MAD)3
Skewness-0.41481291
Sum4093108
Variance12.31347
MonotonicityNot monotonic
2025-03-21T12:49:46.413470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
11 84711
15.6%
12 68006
12.5%
10 60742
11.2%
9 50226
9.3%
7 39518
7.3%
5 37030
6.8%
6 36874
6.8%
3 36748
6.8%
8 35284
6.5%
1 35147
6.5%
Other values (2) 57623
10.6%
ValueCountFrequency (%)
1 35147
6.5%
2 27707
5.1%
3 36748
6.8%
4 29916
5.5%
5 37030
6.8%
6 36874
6.8%
7 39518
7.3%
8 35284
6.5%
9 50226
9.3%
10 60742
11.2%
ValueCountFrequency (%)
12 68006
12.5%
11 84711
15.6%
10 60742
11.2%
9 50226
9.3%
8 35284
6.5%
7 39518
7.3%
6 36874
6.8%
5 37030
6.8%
4 29916
 
5.5%
3 36748
6.8%

Quarter
Categorical

High correlation 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size31.0 MiB
4.0
213459 
3.0
125028 
2.0
103820 
1.0
99602 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1.625.727
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row4.0
3rd row4.0
4th row4.0
5th row4.0

Common Values

ValueCountFrequency (%)
4.0 213459
39.4%
3.0 125028
23.1%
2.0 103820
19.2%
1.0 99602
18.4%

Length

2025-03-21T12:49:46.589465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-21T12:49:46.747454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
4.0 213459
39.4%
3.0 125028
23.1%
2.0 103820
19.2%
1.0 99602
18.4%

Most occurring characters

ValueCountFrequency (%)
. 541909
33.3%
0 541909
33.3%
4 213459
 
13.1%
3 125028
 
7.7%
2 103820
 
6.4%
1 99602
 
6.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1625727
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
. 541909
33.3%
0 541909
33.3%
4 213459
 
13.1%
3 125028
 
7.7%
2 103820
 
6.4%
1 99602
 
6.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1625727
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
. 541909
33.3%
0 541909
33.3%
4 213459
 
13.1%
3 125028
 
7.7%
2 103820
 
6.4%
1 99602
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1625727
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
. 541909
33.3%
0 541909
33.3%
4 213459
 
13.1%
3 125028
 
7.7%
2 103820
 
6.4%
1 99602
 
6.1%

DayOfWeek
Real number (ℝ)

Zeros 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.4312772
Minimum0
Maximum6
Zeros95111
Zeros (%)17.6%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2025-03-21T12:49:46.903447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.8447087
Coefficient of variation (CV)0.75874058
Kurtosis-0.6568368
Mean2.4312772
Median Absolute Deviation (MAD)1
Skewness0.46719471
Sum1317531
Variance3.4029502
MonotonicityNot monotonic
2025-03-21T12:49:47.063473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
3 103857
19.2%
1 101808
18.8%
0 95111
17.6%
2 94565
17.5%
4 82193
15.2%
6 64375
11.9%
ValueCountFrequency (%)
0 95111
17.6%
1 101808
18.8%
2 94565
17.5%
3 103857
19.2%
4 82193
15.2%
6 64375
11.9%
ValueCountFrequency (%)
6 64375
11.9%
4 82193
15.2%
3 103857
19.2%
2 94565
17.5%
1 101808
18.8%
0 95111
17.6%

Hour
Real number (ℝ)

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.078729
Minimum6
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.1 MiB
2025-03-21T12:49:47.225447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile9
Q111
median13
Q315
95-th percentile17
Maximum20
Range14
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.4432701
Coefficient of variation (CV)0.1868125
Kurtosis-0.68580894
Mean13.078729
Median Absolute Deviation (MAD)2
Skewness0.0055453915
Sum7087481
Variance5.9695688
MonotonicityNot monotonic
2025-03-21T12:49:47.394449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
12 78709
14.5%
15 77519
14.3%
13 72259
13.3%
14 67471
12.5%
11 57674
10.6%
16 54516
10.1%
10 49037
9.0%
9 34332
6.3%
17 28509
 
5.3%
8 8909
 
1.6%
Other values (5) 12974
 
2.4%
ValueCountFrequency (%)
6 41
 
< 0.1%
7 383
 
0.1%
8 8909
 
1.6%
9 34332
6.3%
10 49037
9.0%
11 57674
10.6%
12 78709
14.5%
13 72259
13.3%
14 67471
12.5%
15 77519
14.3%
ValueCountFrequency (%)
20 871
 
0.2%
19 3705
 
0.7%
18 7974
 
1.5%
17 28509
 
5.3%
16 54516
10.1%
15 77519
14.3%
14 67471
12.5%
13 72259
13.3%
12 78709
14.5%
11 57674
10.6%

Interactions

2025-03-21T12:49:36.063253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:24.887490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:26.776493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:28.629447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:30.507632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:32.358649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:34.179456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:36.339277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:25.177458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:27.047491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:28.904447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:30.771634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:32.637023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:34.473457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:36.592254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:25.437461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:27.310512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:29.158447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:31.029649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:32.890472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:34.739171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:36.855915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:25.695479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:27.557491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:29.423449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:31.278632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:33.146456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:35.007254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:37.145761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:25.977654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:27.831508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:29.681463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:31.553634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:33.392471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:35.274275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:37.402761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:26.235521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:28.099602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:29.940467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:31.822633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:33.646455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:35.523253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:37.653517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:26.499491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:28.371199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:30.204450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:32.091654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:33.918455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2025-03-21T12:49:35.792253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2025-03-21T12:49:47.536450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
CountryCustomerIDDayOfWeekHourMonthQuantityQuarterTotalVentasUnitPriceYear
Country1.0000.2870.0570.0810.0580.0420.0570.0300.0060.051
CustomerID0.2871.0000.0170.0440.028-0.1090.043-0.131-0.0140.078
DayOfWeek0.0570.0171.000-0.0420.0360.0170.041-0.015-0.0350.038
Hour0.0810.044-0.0421.0000.027-0.2100.063-0.1990.0260.059
Month0.0580.0280.0360.0271.000-0.0251.000-0.031-0.0030.466
Quantity0.042-0.1090.017-0.210-0.0251.0000.0080.692-0.3850.000
Quarter0.0570.0430.0410.0631.0000.0081.0000.0050.0020.362
TotalVentas0.030-0.131-0.015-0.199-0.0310.6920.0051.0000.3270.000
UnitPrice0.006-0.014-0.0350.026-0.003-0.3850.0020.3271.0000.007
Year0.0510.0780.0380.0590.4660.0000.3620.0000.0071.000

Missing values

2025-03-21T12:49:37.968520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-03-21T12:49:38.770535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryTotalVentasYearMonthQuarterDayOfWeekHour
053636585123AWHITE HANGING HEART T-LIGHT HOLDER6.02010-12-01 08:26:002.5517850.0United Kingdom15.302010.012.04.02.08.0
153636571053WHITE METAL LANTERN6.02010-12-01 08:26:003.3917850.0United Kingdom20.342010.012.04.02.08.0
253636584406BCREAM CUPID HEARTS COAT HANGER8.02010-12-01 08:26:002.7517850.0United Kingdom22.002010.012.04.02.08.0
353636584029GKNITTED UNION FLAG HOT WATER BOTTLE6.02010-12-01 08:26:003.3917850.0United Kingdom20.342010.012.04.02.08.0
453636584029ERED WOOLLY HOTTIE WHITE HEART.6.02010-12-01 08:26:003.3917850.0United Kingdom20.342010.012.04.02.08.0
553636522752SET 7 BABUSHKA NESTING BOXES2.02010-12-01 08:26:007.6517850.0United Kingdom15.302010.012.04.02.08.0
653636521730GLASS STAR FROSTED T-LIGHT HOLDER6.02010-12-01 08:26:004.2517850.0United Kingdom25.502010.012.04.02.08.0
753636622633HAND WARMER UNION JACK6.02010-12-01 08:28:001.8517850.0United Kingdom11.102010.012.04.02.08.0
853636622632HAND WARMER RED POLKA DOT6.02010-12-01 08:28:001.8517850.0United Kingdom11.102010.012.04.02.08.0
953636784879ASSORTED COLOUR BIRD ORNAMENT32.02010-12-01 08:34:001.6913047.0United Kingdom54.082010.012.04.02.08.0
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryTotalVentasYearMonthQuarterDayOfWeekHour
54189958158722726ALARM CLOCK BAKELIKE GREEN4.02011-12-09 12:50:003.7512680.0France15.002011.012.04.04.012.0
54190058158722730ALARM CLOCK BAKELIKE IVORY4.02011-12-09 12:50:003.7512680.0France15.002011.012.04.04.012.0
54190158158722367CHILDRENS APRON SPACEBOY DESIGN8.02011-12-09 12:50:001.9512680.0France15.602011.012.04.04.012.0
54190258158722629SPACEBOY LUNCH BOX12.02011-12-09 12:50:001.9512680.0France23.402011.012.04.04.012.0
54190358158723256CHILDRENS CUTLERY SPACEBOY4.02011-12-09 12:50:004.1512680.0France16.602011.012.04.04.012.0
54190458158722613PACK OF 20 SPACEBOY NAPKINS12.02011-12-09 12:50:000.8512680.0France10.202011.012.04.04.012.0
54190558158722899CHILDREN'S APRON DOLLY GIRL6.02011-12-09 12:50:002.1012680.0France12.602011.012.04.04.012.0
54190658158723254CHILDRENS CUTLERY DOLLY GIRL4.02011-12-09 12:50:004.1512680.0France16.602011.012.04.04.012.0
54190758158723255CHILDRENS CUTLERY CIRCUS PARADE4.02011-12-09 12:50:004.1512680.0France16.602011.012.04.04.012.0
54190858158722138BAKING SET 9 PIECE RETROSPOT3.02011-12-09 12:50:004.9512680.0France14.852011.012.04.04.012.0

Duplicate rows

Most frequently occurring

InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryTotalVentasYearMonthQuarterDayOfWeekHour# duplicates
161355552422698PINK REGENCY TEACUP AND SAUCER1.02011-06-05 11:37:002.9516923.0United Kingdom2.952011.06.02.06.011.020
161255552422697GREEN REGENCY TEACUP AND SAUCER1.02011-06-05 11:37:002.9516923.0United Kingdom2.952011.06.02.06.011.012
320257286122775PURPLE DRAWERKNOB ACRYLIC EDWARDIAN12.02011-10-26 12:46:001.2514102.0United Kingdom15.002011.010.04.02.012.08
34753851421756BATH BUILDING BLOCK WORD1.02010-12-12 14:27:005.9515044.0United Kingdom5.952010.012.04.06.014.06
47454052421756BATH BUILDING BLOCK WORD1.02011-01-09 12:53:005.9516735.0United Kingdom5.952011.01.01.06.012.06
52854126621754HOME BUILDING BLOCK WORD1.02011-01-16 16:25:005.9515673.0United Kingdom5.952011.01.01.06.016.06
52954126621755LOVE BUILDING BLOCK WORD1.02011-01-16 16:25:005.9515673.0United Kingdom5.952011.01.01.06.016.06
3146572344MManual48.02011-10-24 10:43:001.5014607.0United Kingdom72.002011.010.04.00.010.06
424957828923395BELLE JARDINIERE CUSHION COVER1.02011-11-23 14:07:003.7517841.0United Kingdom3.752011.011.04.02.014.06
18653722470007HI TEC ALPINE HAND WARMER1.02010-12-05 16:24:001.6513174.0United Kingdom1.652010.012.04.06.016.05